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Understanding the mutual relationships between information flows and social activity in society 

■ today is one of the cornerstones of the social sciences. In financial economics, the key issue in this 
04 , regard is understanding and quantifying how news of all possible types (geopolitical, environmental, 

social, financial, economic, etc.) affect trading and the pricing of firms in organized stock markets. 
O ■ In this paper we seek to address this issue by performing an analysis of more than 24 million 

news records provided by Thompson Reuters and of their relationship with trading activity for 205 
major stocks in the S&P US stock index. We show that the whole landscape of news that affect 
stock price movements can be automatically summarized via simple regularized regressions between 
trading activity and news information pieces decomposed, with the help of simple topic modeling 
techniques, into their "thematic" features. Using these methods, we are able to estimate and quantify 

■ the impacts of news on trading. We introduce network-based visualization techniques to represent 
the whole landscape of news information associated with a basket of stocks. The examination of 
the words that are representative of the topic distributions confirms that our method is able to 
extract the significant pieces of information influencing the stock market. Our results show that 

■ one of the most puzzling stylized fact in financial economies, namely that at certain times trading 
, volumes appear to be "abnormally large," can be explained by the flow of news. In this sense, our 

results prove that there is no "excess trading," if the news are genuinely novel and provide relevant 
financial information. 

(N 

>: 

I. INTRODUCTION 

(N 

■ Neoclassical financial economics based on the "efficient market hypothesis" (EMH) considers price movements as 
almost perfect instantaneous reactions to information flows. Thus, according to the EMH, price changes simply reflect 
exogenous news. Such news - of all possible types (geopolitical, environmental, social, financial, economic, etc.) - lead 

I investors to continuously reassess their expectations of the cash flows that flrms' investment projects could generate in 
T-H ■ the future. These reassessments are translated into readjusted demand/supply functions, which then push prices up or 
^ ] down, depending on the net imbalance between demand and supply, towards a fundamental value. As a consequence, 
. ^ . observed prices are considered the best embodiments of the present value of future cash flows. In this view, market 

■ movements are purely exogenous without any internal feedback loops. In particular, the most extreme losses occurring 
J-j ^ during crashes are considered to be solely triggered exogenously. 

. - - ■ The problem with this paradigm is that, in practice, relating actual price movements to particular news has been 
strikingly elusive. Many attempts to relate price changes to news, be it low frequency or high frequency, have failed 
to find convincing supportive evidence for the EMH 1-6]. Moreover, it has long been recognized that prices move 
much too large an extent and trading volume is much too large compared with what would be predicted from the 
EMH This suggests that there is more to price dynamics thanjust the exogenous flow of information. Against 

this background, the concept of "reflexivity" has been introduced [lfl|, which embodies the notion that past actions 
of investors also significantly influence present decisions so as to create feedback loops and significant endogenous 
dynamics Jjj . The unresolved issue until now is then to disentangle exogenous and endogenous factors and understand 
which news are really important and how they are incorporated in prices. Given the a priori foundational nature of 
news flows on price formation in financial economics on the one hand and the absence of empirical support for it on 
the other hand, without such an understanding and the corresponding control that should derive from it, financial 
markets will remain vulnerable to the excess volatility, wild price swings, bubbles and crashes that have plagued them 
in recent years as well as over most of their history |12| . 

The present paper represents an attempt to break the above stalemate by (i) using a huge database of business news 
gathered for institutional investors and (ii) introducing a new methodology to extract relevant news that influence 
trading activity. This new methodology allows us to remove in large part the endogenous components of price dynamics 
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FIG. 1: Comparison between the time evolution of trading volume (black continuous line) and aggregate news volume (red 
dashed line) for Toyota. The inset plots the trading volume as a function of the concomitant news volume. 

and to identify a hierarchy of important news. Our approach differs in several important dimensions from the ones 
employed by previous studies investigating the impact of news on financial markets, such as [T3l - [l9| . One class of 
previous studies analyzed the information provided by news only in an aggregated manner without taking into account 
the specific information content. However, as casual observation indicates, each news record has different meaning to 
investors and thus different impact on prices, so that just counting the total number of news records for a particular 
period would not work well. Other previous studies only considered a small restricted set of news, such as earnings 
reports and the release of new economic data, and thus suffered from the serious limitation of neglecting the possible 
significant impact of other types of news arriving at the time. One way to circumvent the latter problem could be 
to use very short time intervals [l^l so as to minimize attribution errors. But recent studies, including [SiEJl, have 
shown that the impact of news persists over days, weeks and sometimes months, making it difficult, if not impossible, 
to extract their influence by just using temporal partitioning. 

We address all these problems by performing a simultaneous disaggregated estimation of the relevant news types 
with respect to financial trading activity. We mine raw texts of more than 24 million news records provided by 
Thompson Reuters and examine their impact on trading activity in stocks of the 205 firms listed in the S&P 500 US 
stock index for each of which there were more than 5,000 news records over the period from January 2003 to June 
2011. To determine what pieces of information are the most relevant to explain trading activity of each stock, we 
use a combination of regularized regressions and topic modeling techniques. This allows us to compare quantitatively 
the relative importance of the different news. We show that the top 5% most important events in terms of trading 
volume can be almost perfectly explained by our decoded news flow. 

II. METHODOLOGY 

The existence of a good correspondence between the time evolution of trading activity (measured by the daily 
trading volume) and the time evolution of news volume is well-known [l3l - [T5| . This correspondence is illustrated in 
Fig. 1, which shows the time evolution of the trading volume (the number of shares traded per day) of the Toyota 
stock and the evolution of the volume of news, measured as the number of words per day in text records that include 
the company name Toyota. Using just the number of news records (instead of the total number of words in these 
records) yields essentially the same results. 

Starting from this rough aggregate correspondence, our much more ambitious goal is to disaggregate (a) the flow 
of news into relevant topics and their associated words and (b) the trading volume of individual stocks, in order to 
construct a complete network of interdependences. Fig. 2 provides a flowchart of our methodology, which consists 
of (i) decomposing the total flow of news into their thematic features by applying topic modeling techniques, (ii) 
estimating their impact on trading activity simultaneously in order to prune out the unimportant topics, and (iii) 
quantifying how many of the peaks in trading activity can be explained by news shocks. 

Once a term (for instance Toyota) is chosen and the associated news records are collected (step (f )), the second step 
is to decompose news information pieces into their "thematic" features, as shown in Fig. 2. This is done by applying 
a simple topic modeling technique called Latent Dirichlet Allocation (LDA) [2^ [2^. Topic models are graphical 
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FIG. 2: Flowchart summarizing the procedure followed in our analyses. The number in parentheses indexes the step. Step (1) 
selects the news records associated with a given term, here the name of a company, such as Toyota. Step (2-a) applies the 
Latent Dirichlet Allocation (LDA) that decomposes any document as a mixture of different topics. Step (3-a) implements a 
constrained LASSO regression. The percentage shown in step (3-b) denotes the estimated impact of each topic. The percentage 
shown in step (4) is the "fraction of (trading volume) peaks explained" (FPE) by news, which is our metric to assess the quality 
of our methodology (see text). 



models 2j| which assume that shared global multinomial word distributions (i.e., topic distributions) govern the 
corpus. Word frequencies within a given document are created from a mixture of these global topic distributions. 
LDA is the simplest topic model and uses the Dirichlet prior in order to ensure sparsity in the underlying multinomial 
distribution. This makes learned topics easier to interpret. Since LDA has already yielded excellent results, we did 
not find it useful to employ more elaborate topic models. We removed common stop words from the original news 
records and ran LDA by setting the number of topics to 100 for all stocks analyzed in this paper. Varying the number 
of topics according to the number of news records for each stock did not change the result significantly. We used the 
fast implementation of Smola and Narayanamurthy [25| . 

In what follows, we use the news volume Ik{t) of a given topic k, which is defined as the total number of words 
tagged with topic number k on day t, 



dei{t) w 

where N(d, w, k) is the number of times a word w tagged with topic k appeared in document d and I(t) is the indicator 
function of the set of documents on day t. Fig. 3 presents some examples of the time evolution of the news volume 
for four topics for the term "Toyota." It also lists the top three words of the corresponding topic distributions. A full 
description is provided in the supporting information. 

The fundamental characteristic of LDA (and of topic modeling in general) is that every word that appears in 
the corpus is tagged with a specific topic and is thus assumed to be generated by the corresponding specific topic 
distribution. Put differently, even though words in a given document can be generated by a mixture of topics, each 
word is assumed to be drawn from exactly one topic. This procedure makes the interpretation of the estimated topics 
easier to comprehend [26j . As highlighted by [27i] . this construction, however, has the following negative consequence: 
because news records, such as ours, have many repeated phrases such as "double click for more information," "Reuters 
messaging net," or "top news," many topic distributions simply reflect these repeated phrases. One way to deal with 
this problem is to eliminate these repeated phrases where they appear in the original corpus. However, because it 
is difficult to construct an algorithm that would work well for all the variations found in the huge amount of news 
records analyzed here, we chose to prune the topics using topic distributions, employing the following procedure. For 
each topic, we focused on the top 6 words of the corresponding topic distribution and eliminated that topic if these top 
6 words were included in the set of words in the unwanted repeated phrases (Step 2-b in Fig. 1). We also removed all 
topics that appear for less than 80 days (out of the 3103 days from January 2003 to June 2011). This excludes topics 
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(b) Volume for Topic: Global Recession 
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FIG. 3: Selected topics learned by LDA and the associated news volume estimated using equation (1) for the term "Toyota." 
The top three words for these topics were: (a) Toyota, recall, safety; (b) financial, crisis, economy; (c) Japan, production, 
earthquake; (d) team, Fl, race. 



such as specific symbols and numbers reported in short time intervals. We also eliminated topics that describe stock 
market activity, i.e., which include words such as "hot," "stocks," "markets," etc., in order to focus on the underlying 
news information that is supposed to influence that stock. This procedure corresponds to filtering out the endogenous 
component underlying the information flow and price generating process. Thus, for "Toyota," for example, out of the 
original 100 topics, we are left with 34 useful topics to work with that are associated with the term. 

The relative importance of each topic in explaining trading volume of a given stock is determined by a simple 
LASSO regression (28l - [30| with positive constraints: 

K 

l/o;(<) = + , (2) 

fe=i 

where Vol{t) is the de-trended trading volume at time t. De-trending of the trading volume is performed by subtracting 
the minimum recorded volume observed in the last 100 trading days (boundary values are set to the nearest non-zero 
value). The regularized linear regression with mean-squared error provides a robust estimation of the relationship 
between news topics and trading volume in the presence of large bursts of trading activity and news, so that a larger 
span of activity sizes can contribute to the determination of the regression weights {wk}- The regularization parameter 
used in the LASSO regression was chosen equal to the mean value of the regularization parameter over one hundred 
ten-fold cross validations. Ten-fold cross validation was performed by randomly dividing the entire data set into ten 
subsets and measuring the average mean-squared error of each testing set from the ten-fold cross validation. This 
procedure was performed multiple times to ensure stability of the estimated regularization parameter. 
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FIG. 4: The black line shows the de-trended trading volume of Toyota stock for the period from January 2003 to June 2011. 
The red dots indicate the "peak days" selected by the method described in the text. There are 119 "peak days" for the entire 
period from January 2003 to June 2011. 

Rankl: (toyota) (recall) (safety) - FVE: 0.18 

Rank2: (financial) (crisis) (economy) - FVE: 0.18 

Rank3: (profit) (yen) (bilUon) - FVE: 0.16 

Rank4: (japan) (production) (earthquake) - FVE: 0.14 

Ranks: (economy) (japan) (percent) (boj) - FVE: 0.06 

Rank6: (car) (fiat) (renault) - FVE: 0.05 

Rank7:(gm) (ford) (chrysler) ■ FVE: 0.05 

Rank8: (yen) (percent) (gmt) ■ FVE: 0.05 

Rank9 : (yen) (percent) (billion) ■ FVE : 0.05 

RanklO: (steeO (percent) (nippon) (prices) (demand) ■ FVE : 0.04 

Rankll:(hybrid) (cars) (car) ■ FVE: 0.02 

Rankl2: (pet) (company) (earnings) ■ FVE: 0.01 

FIG. 5: List of the 12 selected topics for "Toyota" with their estimated "fraction of volume explained" (FVE) . Topic distributions 
are summarized with their top most frequent 3-5 words. For a full description of the topic distributions, see the supporting 
information. 

Because researchers are generally interested in explaining large (or "abnormal") market activity, we focus our 
attention on "peak days," defined in terms of the 95th percentile of daily trading volume, so that on 95% of the days 
the trading volume was smaller than during the peak days. To account for the non-stationarity and large growth of 
trading volume over the study period (January 2003 to June 2011), we divided the period overall into 17 six- month 
time windows and identified the "peak days" for each of the 17 time windows separately. This amounts to considering 
17 different distributions of daily volumes, which are each approximately stationary in their corresponding six-month 
time windows. The sequence of peak days is shown in Fig. 4. For each term such as "Toyota," the fraction of the 
corresponding estimated news volume that can be explained by each topic via regression restricting our attention 
to only the news volume found on "peak days," is referred to as the "fraction of volume explained" (FVE). We find 
that only a subset of the useful topics as defined above is necessary to reach a total FVE of 99%. For example, K — 12 
out of the 34 useful topics are sufficient to reach a total FVE of 99% for "Toyota." Fig. 5 provides a list of these 
12 topics and their individual FVEs for "Toyota." Inspections of this list shows that our procedure yields sensible 
results, and unimportant topics such as "Formula One" shown in Fig. 3 are correctly pruned out. 
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Fig. 6 compares the observed trading volume with the fitted trading volume using regression ([2]) (without the 
residual term e{t)) for four stocks: Toyota, Yahoo, Best Buy, and BP. While some parts exhibit a good match, 
other parts show some discrepancy. To quantify the quality of the regression and explanatory power of the topic 
decomposition, we focus on the "peak days" previously defined and shown in Fig. 4. We define a success if the 
predicted volume is at least equal to 30% of the observed trading volume for a given peak day. The fraction of 
peak days among the total number peak days over the entire period from January 2003 to June 2011 whose volume 
is successfully accounted for in this sense is referred to as the "fraction of peaks explained" (FPE). We obtain the 
following values: FPE=0.70 (the total number of explained peak days is 83 out of 119) for Toyota, FPE=0.94 (the 
total number of explained peak days is 112) for Yahoo, FPE=0.98 (the total number of explained peak days is 117) 
for Best Buy, and FPE=0.98 (the total number of explained peak days is 117) for BP. 



(a) Toyota 



(b) Yahoo 




4e+08 



I 2e+08 




2003 2004 2005 2006 2007 2008 2009 2010 2011 
Time 



2003 2004 2005 2006 2007 2008 2009 201 2011 
Time 



(c) Best Buy 




E 

5 1 .Oe+08 



(d) BP 















































































...il... 








1 


L 













2003 2004 2005 2006 2007 2008 2009 201 201 1 
Time 



200320042005200620072008200920102011 
Time 



FIG. 6: Estimated (red dashed line) and actual (black continuous line) trading volume for the four companies: (a) Toyota, (b) 
Yahoo, (c) Best Buy, and (d) BP. The number K of sufficient topics selected to reach a total "fraction of volume explained" 
(FVE) of 99% is 12 for Toyota, 5 for Yahoo, 4 for Best Buy, and 12 for BP. 

The quality of our regression exercise can be further assessed by comparing the results with those obtained using 
reference nulls. Specifically, we swap the news associated with different companies. For example, we use the news 
records associated with BP and use the extracted topics in regression ^ in order to explain the trading volume of 
Yahoo (left panel of Fig. 7) and use the news record associated with Yahoo to explain the trading volume of Best 
Buy (right panel of Fig. 7). This corresponds to modifying only step (1) in the flowchart shown in Fig. 2, while all 
the other steps remain the same. As seen in Fig. 7, even when allowing negative coefficients, the explanatory power 
decreases considerably, as for instance illustrated by the fact that the FPE is exactly in both cases. This substantial 
decrease in explanatory power is found in all our tests and confirms that our regressions done at the daily scale 
perform well in pruning out unimportant topics and identify the relevant ones. Obviously, (i) if the two companies 
for which news records are swapped have some commonalities (e.g., they are engaged in merger talks), or (ii) if they 
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FIG. 7: (Left panel) Comparison between the estimated and actual trading volume when using topics from BP when trying 
to explain Yahoo trading volume. (Right panel) Comparison when using topics from Yahoo when trying to explain Best Buy 
trading volume. In these regressions, to increase the flexibility of the model, we even allowed (un-realistic) negative coefficients. 
Notice the much reduced quality of the regressions compared with those presented in Fig. 6, illustrated by their FPEs, which 
are exactly in both cases. 



always disclose their earnings reports on exactly the same date throughout the entire observation period, then some 
topics found for one stock would explain the trading activity of the other, but this is rarely the case. 



III. RESULTS 



We applied the methodology introduced in the previous section to the 205 companies listed in the S&P 500 US 
stock index for which there were more than 5,000 news records during the period from January 2003 to June 2011. 
Fig. 8 plots the FPE metric as a function of the number of news records for these 205 stocks. Overall, except for 
a dozen companies, the obtained FPE values are higher than for Toyota (which is represented by the red triangle). 
Since the news analyzed in this paper is all in English, this probably indicates a better match between US firms and 
US news than between international firms and US news. 

Over the set of the 205 analyzed US stocks, 987 topics were found to have a significant impact on trading activity. 
Recalling that the logic of topic models, as highlighted by [1^, is that corpus meanings are organized in topics that 
share global multinomial word distributions, a convenient way to visualize the similarities between topics is to use 
network graphs. We therefore construct networks with topics as nodes, and a link between two topics exists when the 
Jensen-Shannon Divergence (JSD) [sij between the two corresponding topic distributions is smaller than 0.5. The 
size of a node is set to be proportional to the "fraction of volume explained" (FVE) by that topic and the thickness 
of a link is equal to 1 minus the JSD metric for the two linked topics. Each topic is labeled by its top three most 
frequent words, as quantified by the topic distribution, together with the company's name. The networks are depicted 
using the Force Atlas algorithm using the freely available software Gephi [34| . 

Fig. 9 shows the network of topics for the two stocks Microsoft and Yahoo. Both have topics refiecting earning 
reports and exhibit features that reflect a potential merger deal. From the node sizes (proportional to their FVEs), 
one can clearly see that the potential merger deal between the two companies had more impact on Yahoo's stock than 
on Microsoft's stock. This is in agreement with the fact that Yahoo was facing difficulties in 2009. This demonstrates 
another useful property of our method, which is that it allows us to quantify and compare the impact of two or more 
external influences. 

Fig. 10 shows the whole network of all the topics extracted by our method for the 205 stocks we focus on. The 
network can be viewed as consisting of the "mainland" and more isolated "islands." The mainland is made up of 
all the connections between topics produced by words reflecting earnings reports ("profit," "earning," "share," "pet 
(short for percent)"), credit ratings ("rating," "debt," "credit"), merger deal ("merger," "deal"), and the financial 
crisis ("crisis," "financial"). In order to better discern some of the major "islands," Fig. 11 presents six zooms on 
the domains indicated by the arrows in Fig. 10. The observed clusters of company names and words representing the 
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FIG. 8: The "fraction of peaks explained" (FPE) as a function of the number of news records for the 205 stocks in the S&P 
500 for which there were more than 5,000 news records during the period from January 2003 to June 2011. The data point 
for Toyota, which as a foreign company of course is not a component of the S&P 500, has been added and is shown as the red 
triangle. 
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FIG. 9: Network extracted for Microsoft and Yahoo, where nodes are topics and hnks between two topics quantify the degree 
of similarity associated with their word distributions. 



topic distributions confirm that our method successfully extracted the correct information. Note that all the word 
contents of the constructed topic distributions have financial and/or economic meaning that carry useful information 
from the point of view of an investor and can be surmised to indeed have an impact on the future earning of the 
firms. We refer in particular to the following word contents: "earning reports," "retailers profits," "drug patents," 
"national defense budget," "new products," "merger deal," "global recession," "natural disasters," and so on. Hence, 
we conclude that we have successfully extracted the important pieces of information that influence financial markets. 
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FIG. 10: Network of topics extracted for the 205 US companies of the S&P 500 index studied here, with links between two 
topics quantifying the degree of similarity associated with their word distributions, as explained in the text. The six red arrows 
depict the zones that are magnified in Fig. 11. 



IV. CONCLUDING REMARKS 



In this study, we performed an analysis of more than 24 million news records provided by Thompson Reuters and of 
their relationship with trading activity of the stock of 205 major firms included in the S&P 500 index. We showed that 
the whole landscape of the news that affect stock price movements can be automatically summarized by conducting a 
simple regularized regression between trading activity and news information pieces decomposed into their "thematic" 
features, with the help of simple topic modeling techniques. Using these methods, not only were we able to extract 
the pieces of information that synchronize well with trading activity but, as a bonus of the simultaneous regressions, 
we were also able to estimate and quantify their impact, which is difficult to do otherwise. We also introduced novel 
ways to visualize the whole landscape of news information associated with a basket of stocks by utilizing network 
visualization techniques. The examination of the words that are representative of the topic distributions confirmed 
that our method successfully extracted the significant pieces of information influencing the stock market. 

The high explanatory power of news for stock market trading activity that we showed in our analysis points at an 
answer to the questions raised at the outset, namely, what news influence the stock market and how they are digested 
in stock prices. In particular, our results show that the large volume of trading can be explained by the flow of news. 
In this sense, our results suggest that there is no "excess trading," if the news are genuinely novel and provide relevant 
financial information. 

One of the reasons for the success of our simple methodology, which does not require taking into account lag 
effects or more sophisticated nonlinear dynamics, is probably the high quality of the news sources, which resulted 
in a high signal-over-noise ratio. Specifically, the news that we used are gathered for professional investors, who 
incentivize the collecting firm by paying significant subscription fees. Our study confirms the exceptional relevance 
of such professional financial sources compared with other standard textual information such as tweets or blogs. The 
size of our database in terms of the number of news records compared with that available from standard newspapers 
was also essential for the extraction of the important topics that influence the trading activity of financial markets. 
In conclusion, we believe that our results summarize the major sources of external influences on financial markets 
stemming from news information associated with them. Another challenge beyond explaining trading activity is to 
explain pricing and financial valuations in general, using the extended universe of news, topics, and their networks. 
This is left for future work. 
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FIG. 11: Six magnifications of the "islands" indicated by the arrows in the network of topics shown in Fig. 10, with hnks 
between two topics quantifying the degree of similarity associated with their word distributions. Each node is accompanied by 
the name of the company and its top three most frequent words, as quantified by the topic distribution. The size of a node is 
set to be proportional to the "fraction of volume explained" (FVE) by that topic and the thickness of a link is equal to 1 minus 
the JSD metric for the two linked topics. Panel (a) shows the network associated with retail sales of clothing companies; panel 
(b) that associated with drug and patents; panel (c) that associated with products in telecommunication business; panel (d) 
that associated with tobacco law suit; panel (e) that associated with national defense budget; panel (f) that associated with 
the potential Comcast Disney merger in 2004. 
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VI. EXACT PRUNING PROCDEDURE OF STEP (2-B) 

A. Topics reflecting repeated phrases 

If the top 6 words in the topic distribution included more than one of these words hsted below we defined the topic 
distribution as only reflecting repeated news phrases. Namely they are "top", "news", "reutersnet" , "messaging", 
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"double", "click", "words", "moved", "update", "stories", "press", "corp", "services", and "trademark". 



B. Topics reflecting market words 

If the top 6 words in the topic distribution included more than one of these words listed below we defined the 
topic distribution as only reflecting market news. Namely they are "stock" , "stocks" , "index" , "futures" , "market" , 
"rose", "fell", "buy", "sell", "nasdaq", "nyse", "dow", "target", all sorts of numbers and names of month (i.e. "jan", 
"January",...). 



VII. FULL DESCRIPTION OF LEARNED TOPICS FOR "TOYOTA" 

We show the top 10 words for all the 100 topic distributions learned for "Toyota". The number in parenthis 
describes the probabihty of each word in the topic distribution. We also show the time evolution of news volume for 
all the 100 topics learned for "Toyota" (Fig. 1). 

Topic 0: (ethanol, 0.006) (oil, 0.006) (reuters, 0.005) (government, 0.004) (sugar, 0.004) (police, 0.004) (minister, 
0.003) (president, 0.003) (thai, 0.003) (iraq, 0.003) 

Topic 1: (toyota, 0.069) (recall, 0.020) (safety, 0.015) (vehicles, 0.012) (recalls, 0.008) (reuters, 0.007) (problems, 
0.007) (toyoda, 0.006) (acceleration, 0.005) (million, 0.005) 

Topic 2: (notified, 0.025) (sept, 0.016) (aug, 0.012) (company, 0.010) (feb, 0.010) (group, 0.009) (march, 0.008) 
(acquire, 0.007) (joint, 0.007) (oct, 0.007) 

Topic 3: (percent, 0.038) (yen, 0.029) (nikkei, 0.013) (1, 0.010) (stocks, 0.010) (shares, 0.009) (maker, 0.009) 
(investors, 0.008) (rose, 0.007) (earnings, 0.007) 

Topic 4: (pet, 0.106) (100, 0.053) (incr, 0.043) (kfw, 0.027) (50, 0.020) (eib, 0.020) (rabobank, 0.013) (5125, 0.010) 
(70, 0.008) (200, 0.008) 

Topic 5: (asia, 0.010) (stocks, 0.009) (asian, 0.008) (hong-kong, 0.008) (results, 0.008) (earnings, 0.008) (top, 0.007) 
(data, 0.007) (Singapore, 0.006) (markets, 0.006) 

Topic 6: (pet, 0.110) (100, 0.064) (50, 0.030) (incr, 0.016) (9998, 0.016) (475, 0.014) (55, 0.013) (525, 0.012) (unit, 
0.009) (200, 0.008) 

Topic 7: (yen, 0.044) (percent, 0.033) (gmt, 0.016) (profit, 0.009) (1, 0.007) (billion, 0.007) (maker, 0.007) (company, 
0.006) (corp, 0.006) (report, 0.006) 

Topic 8: (percent, 0.046) (yen, 0.040) (nikkei, 0.014) (fell, 0.014) (shares, 0.011) (exporters, 0.009) (stocks, 0.009) 
(financial, 0.008) (market, 0.007) (lost, 0.007) 

Topic 9: (china, 0.029) (car, 0.017) (year, 0.012) (market, 0.011) (auto, 0.011) (percent, 0.011) (cars, 0.010) (chinese, 
0.009) (india, 0.009) (000, 0.009) 

Topic 10: (pet, 0.105) (100, 0.048) (50, 0.046) (incr, 0.045) (575, 0.024) (bng, 0.020) (60, 0.017) (55, 0.013) (200, 
0.013) (550, 0.012) 

Topic 11: (pet, 0.095) (100, 0.037) (200, 0.026) (incr, 0.020) (tranche, 0.014) (55, 0.014) (625, 0.012) (50, 0.012) 
(deal, 0.011) (par, 0.011) 

Topic 12: (news, 0.016) (percent, 0.016) (click, 0.011) (related, 0.011) (reuters, 0.010) (double, 0.009) (shares, 0.009) 
(euro, 0.008) (german, 0.008) (messaging, 0.007) 

Topic 13: (pet, 0.073) (frn, 0.040) (par, 0.025) (200, 0.023) (incr, 0.020) (300, 0.016) (100, 0.015) (150, 0.014) (50, 
0.012) (40, 0.009) 

Topic 14: (hyundai, 0.034) (percent, 0.023) (won, 0.022) (sales, 0.015) (s-korea, 0.014) (market, 0.009) (shares, 
0.009) (kia, 0.009) (seoul, 0.008) (year, 0.008) 

Topic 15: (assigned, 0.074) (series, 0.037) (trust, 0.023) (flind, 0.022) (ptc, 0.020) (flindso, 0.017) (loan, 0.017) 
(reaffirmed, 0.014) (It, 0.012) (bbbind, 0.011) 

Topic 16: (news, 0.050) (top, 0.038) (reuters, 0.022) (japan, 0.018) (stocks, 0.017) (visit, 0.014) (nikkei, 0.013) 
(data, 0.013) (30, 0.013) (markets, 0.012) 

Topic 17: (oct, 0.044) (dec, 0.042) (nov, 0.037) (2009, 0.029) (zar, 0.023) (2010, 0.021) (sep, 0.021) (nzd, 0.018) 
(world-bank, 0.018) (nomura, 0.018) 

Topic 18: (percent, 0.043) (yen, 0.034) (shares, 0.013) (nikkei, 0.010) (financial, 0.010) (stocks, 0.008) (rose, 0.008) 
(bank, 0.008) (banks, 0.008) (group, 0.007) 

Topic 19: (pet, 0.108) (100, 0.062) (incr, 0.043) (eib, 0.028) (kfw, 0.020) (65, 0.017) (70, 0.015) (60, 0.014) (200, 
0.014) (jul2009, 0.013) 

Topic 20: (economy, 0.012) (japan, 0.012) (percent, 0.011) (yen, 0.011) (boj, 0.011) (Japanese, 0.010) (japans, 0.008) 
(year, 0.008) (market, 0.007) (government, 0.006) 
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Topic 21: (na, 0.051) (hong-kong, 0.013) (china, 0.011) (firm, 0.010) (group, 0.009) (million, 0.008) (corp, 0.007) 
(2008, 0.007) (plans, 0.007) (company, 0.007) 

Topic 22: (percent, 0.032) (yen, 0.022) (stocks, 0.009) (investors, 0.009) (market, 0.008) (nikkei, 0.008) (japans, 
0.008) (shares, 0.008) (tokyo, 0.008) (maker, 0.005) 

Topic 23: (07, 0.029) (pet, 0.019) (terms, 0.016) (08, 0.016) (06, 0.015) (yryr, 0.014) (2003, 0.013) (2006, 0.013) 
(dec, 0.012) (oct, 0.012) 

Topic 24: (yen, 0.045) (percent, 0.038) (gmt, 0.016) (billion, 0.012) (profit, 0.011) (shares, 0.009) (year, 0.006) (fell, 
0.006) (2, 0.005) (maker, 0.005) 

Topic 25: (price, 0.031) (target, 0.030) (expected, 0.022) (issues, 0.021) (corp, 0.020) (outperform, 0.016) (etr, 
0.016) (percentage, 0.015) (points, 0.015) (rating, 0.013) 

Topic 26: (na, 0.029) (zar, 0.024) (2012, 0.022) (aud, 0.022) (sep, 0.020) (2011, 0.019) (daiwa, 0.019) (jul, 0.017) 
(jun, 0.017) (aug, 0.016) 

Topic 27: (1, 0.073) (jpy, 0.037) (2, 0.024) (3, 0.016) (4, 0.014) (corporation, 0.012) (5, 0.010) (hi, 0.010) (6, 0.008) 
(krw, 0.006) 

Topic 28: (percent, 0.016) (motor, 0.014) (year, 0.013) (sales, 0.012) (1, 0.012) (toyota, 0.011) (japan, 0.011) 
(motors, 0.010) (vehicles, 0.010) (heavy, 0.010) 

Topic 29: (team, 0.014) (fl, 0.011) (race, 0.010) (toyota, 0.007) (prix, 0.007) (season, 0.007) (grand, 0.007) (teams, 
0.006) (renters, 0.006) (year, 0.006) 

Topic 30: (pet, 0.112) (100, 0.083) (60, 0.027) (2004, 0.016) (50, 0.014) (625, 0.013) (2002, 0.012) (2003, 0.011) (75, 
0.011) (250, 0.010) 

Topic 31: (percent, 0.031) (shares, 0.016) (renters, 0.014) (rm, 0.011) (reutersnet, 0.011) (messaging, 0.011) (euro, 
0.006) (company, 0.005) (index, 0.005) (click, 0.005) 

Topic 32: (apr, 0.028) (bonds, 0.020) (yen, 0.019) (10, 0.016) (sep, 0.016) (date, 0.016) (mizuho, 0.016) (sec, 0.016) 
(nomura, 0.012) (20, 0.012) 

Topic 33: (percent, 0.063) (shares, 0.028) (index, 0.024) (adrs, 0.019) (rose, 0.014) (fell, 0.013) (leading, 0.012) 
(new-york, 0.011) (cents, 0.009) (european, 0.009) 

Topic 34: (gold, 0.012) (prices, 0.011) (market, 0.009) (futures, 0.009) (percent, 0.008) (1, 0.008) (rubber, 0.007) 
(jgb, 0.007) (euro, 0.006) (demand, 0.005) 

Topic 35: (years, 0.005) (china, 0.005) (people, 0.004) (japanese, 0.004) (workers, 0.004) (japan, 0.004) (year, 0.003) 
(time, 0.003) (business, 0.003) (work, 0.002) 

Topic 36: (percent, 0.030) (sales, 0.019) (car, 0.018) (market, 0.016) (year, 0.012) (brand, 0.012) (european, 0.010) 
(cars, 0.009) (europe, 0.009) (registrations, 0.007) 

Topic 37: (2010, 0.035) (gmt, 0.033) (title, 0.030) (renters, 0.030) (timedate, 0.029) (description, 0.029) (feb, 0.022) 
(toyota, 0.013) (wed, 0.012) (insider, 0.010) 

Topic 38: (pet, 0.062) (frn, 0.048) (100, 0.035) (incr, 0.019) (150, 0.013) (200, 0.013) (500, 0.011) (50, 0.011) (425, 
0.010) (jan2006, 0.010) 

Topic 39: (click, 0.018) (double, 0.014) (percent, 0.009) (report, 0.008) (market, 0.007) (oil, 0.007) (stocks, 0.006) 
(group, 0.006) (latest, 0.006) (euro, 0.005) 

Topic 40: (pet, 0.048) (frn, 0.043) (deal, 0.017) (incr, 0.016) (500, 0.016) (par, 0.012) (100, 0.012) (tranche, 0.012) 
(250, 0.011) (10b, 0.010) 

Topic 41: (debt, 0.015) (rating, 0.015) (million, 0.012) (moodys, 0.009) (ratings, 0.008) (fiscal, 0.008) (bonds, 0.007) 
(tax, 0.007) (state, 0.006) (credit, 0.006) 

Topic 42: (pet, 0.094) (100, 0.041) (200, 0.028) (incr, 0.017) (625, 0.016) (55, 0.015) (tranche, 0.015) (575, 0.012) 
(multi, 0.011) (deal, 0.011) 

Topic 43: (percent, 0.043) (yen, 0.036) (gmt, 0.011) (shares, 0.009) (rose, 0.008) (jal, 0.007) (nikkei, 0.007) (1, 
0.006) (toyota, 0.006) (exporters, 0.005) 

Topic 44: (pet, 0.076) (incr, 0.031) (frn, 0.021) (reoffer, 0.018) (100, 0.017) (eib, 0.013) (150, 0.013) (multi, 0.012) 
(tranche, 0.012) (200, 0.012) 

Topic 45: (pet, 0.107) (incr, 0.019) (100, 0.017) (250, 0.012) (25, 0.011) (200, 0.011) (150, 0.011) (500, 0.009) (275, 
0.008) (deal, 0.008) 

Topic 46: (top, 0.021) (news, 0.015) (auf, 0.009) (weitere, 0.009) (der, 0.007) (und, 0.007) (index, 0.007) (firmen, 
0.007) (unternehmen, 0.007) (die, 0.006) 

Topic 47: (yen, 0.041) (percent, 0.037) (gmt, 0.015) (profit, 0.009) (shares, 0.008) (maker, 0.007) (price, 0.006) 
(toyota, 0.006) (forecast, 0.006) (stocks, 0.006) 

Topic 48: (rec, 0.059) (000, 0.038) (sm, 0.032) (cargill, 0.019) (sb, 0.018) (mz, 0.015) (bunge, 0.015) (china, 0.014) 
(sbo, 0.012) (roads, 0.009) 

Topic 49: (pet, 0.108) (100, 0.054) (incr, 0.043) (kfw, 0.027) (eib, 0.020) (50, 0.020) (rabobank, 0.013) (5125, 0.010) 
(200, 0.008) (70, 0.008) 



15 



Topic 50: (1, 0.016) (sales, 0.010) (car, 0.008) (2, 0.008) (pet, 0.007) (na, 0.007) (vehicles, 0.007) (3, 0.006) (4, 

0.005) (10, 0.005) 

Topic 51: (feb, 0.028) (bonds, 0.023) (sept, 0.022) (yen, 0.020) (10, 0.017) (na, 0.015) (date, 0.014) (coupon, 0.012) 
(mizuho, 0.012) (credit, 0.011) 

Topic 52: (mln, 0.078) (10-yr, 0.021) (bin, 0.018) (bond, 0.017) (company, 0.016) (amt, 0.016) (rtgs, 0.016) (matdebt, 
0.015) (mgrs, 0.015) (sales, 0.015) 

Topic 53: (pet, 0.109) (incr, 0.049) (rabobank, 0.037) (50, 0.033) (100, 0.028) (75, 0.020) (45, 0.017) (cba, 0.013) 
(60, 0.013) (kfw, 0.012) 

Topic 54: (gm, 0.029) (ford, 0.017) (chryslcr, 0.014) (auto, 0.011) (billion, 0.008) (automakers, 0.008) (automaker, 
0.006) (bankruptcy, 0.006) (workers, 0.006) (general-motors, 0.005) 

Topic 55: (pet, 0.086) (par, 0.028) (reoffer, 0.025) (10b, 0.023) (frn, 0.022) (500, 0.018) (300, 0.010) (incr, 0.009) 
(200, 0.008) (475, 0.008) 

Topic 56: (hybrid, 0.017) (cars, 0.016) (car, 0.013) (electric, 0.013) (vehicles, 0.011) (toyota, 0.010) (fuel, 0.008) 
(technology, 0.007) (battery, 0.006) (batteries, 0.005) 

Topic 57: (words, 0.035) (moved, 0.022) (update, 0.019) (expect, 0.013) (1, 0.011) (2, 0.009) (business, 0.006) (600, 
0.006) (gmt, 0.006) (3, 0.005) 

Topic 58: (yen, 0.043) (percent, 0.035) (gmt, 0.017) (billion, 0.012) (profit, 0.011) (shares, 0.009) (year, 0.008) 
(forecast, 0.006) (maker, 0.006) (million, 0.006) 

Topic 59: (percent, 0.034) (yen, 0.033) (nikkei, 0.021) (market, 0.011) (shares, 0.010) (rose, 0.008) (investors, 0.007) 
(exporters, 0.007) (average, 0.007) (3, 0.006) 

Topic 60: (financial, 0.013) (crisis, 0.011) (economy, 0.010) (global, 0.009) (cut, 0.009) (banks, 0.008) (economic, 
0.008) (recession, 0.007) (billion, 0.007) (percent, 0.006) 

Topic 61: (japan, 0.026) (production, 0.020) (earthquake, 0.011) (plant, 0.011) (plants, 0.009) (nuclear, 0.009) 
(power, 0.009) (parts, 0.009) (quake, 0.008) (supply, 0.008) 

Topic 62: (pet, 0.110) (incr, 0.060) (50, 0.049) (100, 0.032) (rabobank, 0.025) (60, 0.023) (bng, 0.016) (575, 0.015) 
(kfw, 0.015) (jan2015, 0.010) 

Topic 63: (fixed, 0.019) (tap, 0.015) (westpac, 0.014) (bond, 0.014) (mln, 0.013) (rmbs, 0.012) (undisc, 0.012) (frn, 
0.012) (eurobond, 0.012) (aaaalaa, 0.011) 

Topic 64: (03, 0.030) (02, 0.024) (pet, 0.018) (terms, 0.015) (2002, 0.015) (2001, 0.014) (1997, 0.013) (yryr, 0.013) 
(1999, 0.012) (dec, 0.012) 

Topic 65: (pet, 0.107) (100, 0.059) (incr, 0.041) (kfw, 0.025) (eib, 0.023) (60, 0.013) (200, 0.012) (65, 0.011) (70, 
0.011) (jul2009, 0.010) 

Topic 66: (auto, 0.012) (sales, 0.011) (sees, 0.009) (stories, 0.008) (gm, 0.007) (car, 0.007) (europe, 0.006) (toyota, 
0.005) (news, 0.005) 

Topic 67: (percent, 0.032) (dollar, 0.013) (stocks, 0.010) (markets, 0.009) (index, 0.008) (shares, 0.008) (investors, 
0.008) (asian, 0.008) (feh, 0.007) (yen, 0.007) 

Topic 68: (05, 0.025) (04, 0.024) (pet, 0.019) (terms, 0.016) (06, 0.015) (yryr, 0.014) (2003, 0.013) (2004, 0.013) 
(03, 0.013) (oct, 0.013) 

Topic 69: (nikkei, 0.019) (japan, 0.012) (market, 0.011) (stocks, 0.010) (yen, 0.008) (business, 0.007) (billion, 0.006) 
(friday, 0.006) (percent, 0.006) (daily, 0.005) 

Topic 70: (2010, 0.036) (aug, 0.034) (world-bank, 0.030) (2009, 0.029) (na, 0.026) (jun, 0.024) (rand, 0.022) (mizuho, 
0.020) (jul, 0.019) (feb, 0.019) 

Topic 71: (renters, 0.007) (company, 0.006) (business, 0.006) (million, 0.006) (billion, 0.005) (press, 0.004) (stories, 
0.004) (companies, 0.004) (sony, 0.004) (executive, 0.003) 

Topic 72: (yen, 0.047) (percent, 0.030) (billion, 0.015) (gmt, 0.015) (profit, 0.009) (year, 0.008) (million, 0.007) 
(shares, 0.007) (company, 0.007) (japans, 0.006) 

Topic 73: (percent, 0.030) (yen, 0.023) (shares, 0.011) (nikkei, 0.011) (investors, 0.010) (earnings, 0.010) (market, 
0.009) (rose, 0.006) (1, 0.006) (stocks, 0.006) 

Topic 74: (yen, 0.045) (percent, 0.038) (gmt, 0.016) (profit, 0.009) (billion, 0.008) (shares, 0.007) (1, 0.006) (million, 
0.005) (maker, 0.005) (forecast, 0.005) 

Topic 75: (nihon, 0.012) (keizai, 0.012) (news, 0.011) (page, 0.011) (asahi, 0.011) (mainichi, 0.010) (yomiuri, 0.010) 
(japan, 0.010) (australian, 0.007) (japanese, 0.007) 

Topic 76: (pet, 0.097) (100, 0.037) (incr, 0.035) (50, 0.035) (575, 0.026) (bng, 0.017) (55, 0.017) (200, 0.014) (frn, 
0.011) (550, 0.011) 

Topic 77: (percent, 0.038) (yen, 0.027) (nikkei, 0.013) (market, 0.011) (stocks, 0.010) (shares, 0.010) (1, 0.008) 

(investors, 0.007) (rose, 0.006) (feU, 0.006) 

Topic 78: (yen, 0.048) (percent, 0.043) (gmt, 0.020) (shares, 0.011) (profit, 0.009) (billion, 0.009) (rose, 0.008) (1, 
0.008) (nikkei, 0.006) (fell, 0.006) 
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Topic 79: (aaaaaa, 0.023) (rmbs, 0.019) (fixed, 0.018) (euro, 0.014) (100, 0.011) (frn, 0.011) (td, 0.010) (sec, 0.009) 
(aaa, 0.009) (aaaaaaaaa, 0.008) 

Topic 80: (yen, 0.023) (percent, 0.018) (billion, 0.010) (gmt, 0.008) (profit, 0.008) (earnings, 0.008) (forecast, 0.006) 
(year, 0.005) (company, 0.005) (cut, 0.005) 

Topic 81: (toyota, 0.022) (percent, 0.021) (year, 0.019) (000, 0.015) (sales, 0.014) (plant, 0.012) (car, 0.012) (million, 
0.010) (production, 0.010) (units, 0.009) 

Topic 82: (percent, 0.029) (points, 0.015) (index, 0.012) (new-york, 0.010) (gold, 0.009) (1, 0.007) (closed, 0.007) 
(prices, 0.006) (london, 0.006) (crude, 0.006) 

Topic 83: (pet, 0.106) (100, 0.059) (incr, 0.035) (425, 0.017) (50, 0.015) (ontario, 0.014) (200, 0.013) (525, 0.011) 
(40, 0.010) (dec2007, 0.010) 

Topic 84: (sales, 0.060) (percent, 0.032) (ford, 0.017) (year, 0.013) (toyota, 0.011) (auto, 0.011) (market, 0.010) 
(gm, 0.009) (vehicles, 0.009) (million, 0.008) 

Topic 85: (profit, 0.028) (yen, 0.026) (billion, 0.025) (percent, 0.023) (year, 0.023) (toyota, 0.018) (sales, 0.015) 
(operating, 0.013) (forecast, 0.011) (earnings, 0.009) 

Topic 86: (yen, 0.045) (percent, 0.035) (profit, 0.016) (gmt, 0.015) (billion, 0.014) (year, 0.010) (forecast, 0.008) 
(maker, 0.007) (sales, 0.006) (group, 0.006) 

Topic 87: (pet, 0.104) (100, 0.031) (incr, 0.030) (200, 0.019) (50, 0.018) (rabobank, 0.016) (ge-capital, 0.015) 
(tranche, 0.014) (150, 0.011) (feb2012, 0.011) 

Topic 88: (germany, 0.021) (1, 0.017) (ferrari, 0.017) (2, 0.014) (3, 0.014) (renault, 0.014) (toyota, 0.014) (britain, 
0.014) (uk, 0.014) (4, 0.014) 

Topic 89: (rmbs, 0.024) (fixed, 0.021) (frn, 0.017) (aaaaaa-, 0.013) (eurobond, 0.013) (aaaaaaaaa, 0.012) (anz, 0.010) 
(tap, 0.009) (200, 0.008) (2007-1, 0.008) 

Topic 90: (yen, 0.038) (percent, 0.028) (billion, 0.013) (gmt, 0.012) (profit, 0.008) (group, 0.008) (maker, 0.007) 
(year, 0.007) (shares, 0.007) (million, 0.006) 

Topic 91: (car, 0.010) (fiat, 0.010) (euro, 0.008) (renault, 0.008) (percent, 0.008) (europe, 0.005) (reuters, 0.005) 
(volkswagen, 0.005) (german, 0.005) (industry, 0.005) 

Topic 92: (percent, 0.020) (yen, 0.020) (electronics, 0.015) (billion, 0.011) (nec, 0.009) (year, 0.009) (maker, 0.008) 
(sanyo, 0.007) (shares, 0.007) (corp, 0.007) 

Topic 93: (words, 0.032) (moved, 0.021) (update, 0.016) (1, 0.010) (expect, 0.009) (2, 0.007) (london, 0.006) (600, 
0.006) (gmt, 0.005) (3, 0.005) 

Topic 94: (words, 0.036) (moved, 0.031) (pix, 0.012) (update, 0.011) (tv, 0.009) (1, 0.006) (600, 0.006) (president, 
0.005) (2, 0.004) (700, 0.004) 

Topic 95: (dec, 0.037) (nov, 0.037) (bonds, 0.021) (10, 0.018) (yen, 0.017) (mar, 0.013) (nomura, 0.013) (mizuho, 
0.013) (date, 0.012) (na, 0.011) 

Topic 96: (reuters, 0.019) (messaging, 0.017) (percent, 0.017) (stocks, 0.016) (click, 0.014) (shares, 0.014) (reuter- 
snet, 0.011) (rm, 0.011) (double, 0.010) (stock, 0.010) 

Topic 97: (steel, 0.065) (percent, 0.017) (nippon, 0.016) (prices, 0.015) (demand, 0.012) (year, 0.010) (price, 0.009) 
(i5401ti, 0.008) (nippon-steel, 0.007) (Japanese, 0.007) 

Topic 98: (bonds, 0.028) (yen, 0.021) (10, 0.017) (na, 0.014) (Jul, 0.014) (date, 0.013) (oct, 0.012) (20, 0.010) (corp, 
0.010) (mizuho, 0.010) 

Topic 99: (pet, 0.035) (company, 0.024) (earnings, 0.020) (reported, 0.019) (toyota, 0.013) (expectations, 0.013) 
(year, 0.010) (corp, 0.010) (results, 0.010) (shares, 0.010) 
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FIG. 12: Time evolution of news volume for all the learned topics for "Toyota". Horizontal axis denotes days (Jan 2003 to 
June 2011) and vertical axis shows their unnormalized volume. 



